D754-D760 Nucleic Acids Research, 2012, Vol. 40, Database issue 
doi:10.1093lnar/gkrll26 



Published online 1 December 2011 



Rhea— a manually curated resource of 
biochemical reactions 

Rafael Alcantara^'*, Kristian B. Axelsen^, Anne Morgat^'^ Eugeni Belda^ 
Elisabeth Coudert^, Alan Bridge^ Hong Cao\ Paula de Matos\ Marcus Ennis\ 
Steve Turner^ Gareth Owen\ Lydie Bougueleret^, loannis Xenarios^'^ and 
Christoph Steinbeck^ 

^Chemoinformatics and Metabolism Team, European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, 
UK, ^Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CIVIU, 1 rue IVIicliel-Servet, CH-1211 Geneva 4, 
Switzerland, ^Equipe BAMBOO, INRIA Grenoble Rhone-Alpes, 655 avenue de I'Europe, F-38330 Montbonnot 
Saint-Martin, France, "^Genoscope-LABGeM, CEA, 2 Rue Gaston Cremieux, CP 5706, F-91057 Evry, 
France and ^Vital-IT, Swiss Institute of Bioinformatics, Ouartier Sorge, Batiment Genopode, CH-1015 
Lausanne, Switzerland 

Received August 18, 2011; Revised November 4, 2011; Accepted November 8, 2011 



ABSTRACT 

Rhea (http://www.ebi.ac.uk/rhea) is a comprehen- 
sive resource of expert-curated biochemical 
reactions. Rhea provides a non-redundant set of 
chemical transformations for use in a broad 
spectrum of applications, including metabolic 
network reconstruction and pathway inference. 
Rhea includes enzyme-catalyzed reactions 
(covering the lUBMB Enzyme Nomenclature list), 
transport reactions and spontaneously occurring re- 
actions. Rhea reactions are described using 
chemical species from the Chemical Entities of 
Biological Interest ontology (ChEBI) and are stoi- 
chiometrically balanced for mass and charge. They 
are extensively manually curated with links to 
source literature and other public resources on me- 
tabolism including enzyme and pathway databases. 
This cross-referencing facilitates the mapping and 
reconciliation of common reactions and compounds 
between distinct resources, which is a common first 
step in the reconstruction of genome scale metabol- 
ic networks and models. 



RHEA AIMS AND SCOPE 

Rhea is a freely available and comprehensive resource of 
expert-curated biochemical reactions. It has been designed 
to provide a non-redundant set of chemical transformations 
for applications such as the functional annotation of 



enzymes, pathway inference and metabolic network recon- 
stniction. Rhea provides exphcit representations of biochem- 
ical reactions using chemical species from the Chemical 
Entities of Biological Interest ontology (ChEBI) (1) that 
include structural infomiation and curated links to a host 
of other resources. Rhea reaction descriptions cover the 
official Ust of enzyme catalyzed reactions defined by the 
Nomenclature Committee of the lUBMB (NC-IUBMB) 
(http://www.chem.qmul.ac.uk/iubmb/enzyme/) (2,3). This 
extends where possible to the provision of explicit descrip- 
tions for specific instances of generic reactions that are 
catalyzed by enzymes with broad substrate specificity (as 
well as reactions that are only referred to within the free 
text comments of lUBMB entries). One example of such a 
generic reaction is that catalyzed by alcohol dehydrogenase 
(EC 1.1.1.1), which is described in the following way within 
the lUBMB classification: 

An alcohol + NAD^ = an aldehyde or ketone + NADH 

Within Rhea, a generic reaction description is provided 
that corresponds to this textual description, along with 
specific reactions for all known substrate/product pairs. 
A similar approach is used to describe reactions involving 
polymers with varying numbers of repeated units, such as 
isoprenoid quinones, which participate in the reaction 
catalyzed by malate dehydrogenase (EC 1.1.5.4): 

(5)-malate + amenaquinone 

= oxaloacetate + a menaquinol 

This reaction involves a menaquinone with a variable 
number of isoprene units (4): eight isoprene units in 
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EC 1.1.1.159 



3«,7c(,l 2n-trlh/dioxy-5p-cliolanate - NAD- = 3a.12n-dihytlroxy-7-OKO-5p-Cholanate - NADH - H- 



EC 3.1.2.27 




choloyl-CoA + HjO = cholate + CoA 



CHEBI:29747 
common name: cholate 



RHEA:22580 (EC 4.2.1.80) 4-hydroxy-2-oxopenlanoale = 2-OKopenM-enoate + HjO 



RHEA:31295 
keto-enol tautomerization 



(2£)-2-hydroxyTMrtlB-2.4-dlflnoBefl 
CHCISI 6088*" W 



RHEA:27873 (EC 3.7.1.13) 



(2f ,4E)-6-(2-aminophenyl)-Z-hydroxy-6-oxohBxa-2,4-dienoate 4 HjO <=> (2f)-2'hvdrojivpenta-2,4-dienoate + anthranilate + H 




CHEB I 1184l » A 



Figure 1. Chemical compound issues. (A) The same chemical compound can be described using different names in textual representations of 
reactions in the lUBMB classification. (B) A chemical compound may exist in different forms that can interchange spontaneously, such as keto 
and enol tautomers, and reactions may include each of these forms. To illustrate this, RHEA:22580 describes a reaction involving a keto tautomeric 
form while RHEA:27873 describes a separate reaction involving an enolic form of the same compound. The keto-enol tautomerization reaction 
RHEA:31295 allows the two reactions to be linked if necessary. 



Escherichia coli (menaquinone-8, MK-8), seven isoprene 
units in certain species of the genus Bacillus and nine in 
species of Streptococcus. Each of these reactions has a 
precise description within Rhea, which specifies the 
number of repeat units within the menaquinone. 

In addition to providing specific descriptions of known 
instances of generic reactions, Rhea also provides trans- 
port reactions and spontaneous reactions that lack a cor- 
responding textual definition in the lUBMB classification. 
These reactions are designed to facihtate the use of Rhea 
as a reference resource for genome-scale metabolic 
network reconstruction, where precise definitions of sub- 
strate and product specificity, and the inclusion of spon- 
taneous and transport reactions, are essential steps in 
building a functional genome-scale model for metabolism 
(5-7). More generally, the use of explicit reaction descrip- 
tions facihtates the precise identification, mapping and 
comparison of compounds and reactions between differ- 
ent resources and different metabolic models (8). This can 
be difficult to achieve using the systematic unambiguous 
chemical nomenclature of lUPAC (International Union of 
Pure and Applied Chemistry), as biologists often prefer 
common names to lUPAC standardized labels (which 
can be relatively complex). The inconsistent use of 
common names and standardized nomenclature can lead 
to ambiguity in reaction and compound descriptions that 
requires manual curation to resolve, as illustrated in 
Figure lA, where a single compound is referred to vari- 
ously as cholate and 3a,12a-dihydroxy-7-oxo-5p- 
cholanate within two reaction descriptions. A further 



source of ambiguity arises from the use of generic 
compound labels that are intended to describe more 
than one chemical species, such as NAD(P)/NAD(P)H, 
which is often used in oxidoreduction reactions that use 
NAD+/NADH or NADP+/NADPH redox couples. Rhea 
provides an explicit description of each of the correspond- 
ing reactions, allowing unambiguous assignment of reac- 
tions including NAD+/NADH or NADP-H/NADPH. A 
third example of ambiguity may occur when considering 
reactions involving keto-enol tautomers. Rhea provides 
spontaneous tautomerization reactions that can be used 
to hnk reactions involving tautomeric forms of the same 
compound, such as RHEA:31295 which connects reac- 
tions involving (2ii)-2-hydroxypenta-2,4-dienoate (enol 
form) and 2-oxopent-4-enoate (keto form) (Figure IB). 

Although Rhea attempts to reduce or eliminate ambi- 
guity in reaction descriptions wherever possible, some al- 
lowance is made for incomplete knowledge of reaction 
chemistry. Rhea provides incomplete reactions where not 
all the reactants are known, and where the reactions are 
not necessarily balanced. These reactions are clearly 
identified by their 'preliminary' status. 

Reaction representation in Rhea 

Rhea provides explicit representations of biochemical re- 
actions using chemical species from the ChEBI ontology 
(1). ChEBI includes information on chemical formula and 
charge, as well as nomenclature and 2D-structural infor- 
mation in various chemical formats. The latter informa- 
tion is used by Rhea for reaction search, display, 
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Table 1. The master reaction RHEA: 151 33 (shown in Figure 3) has an undefined direction, represented with the symbol <?>. It is associated 
with three directional reactions (RHEA: 15134, RHEA: 15135 and RHEA:15136), each of which has a specific set of corresponding 
cross-references to external databases 


Reaction and direction 


Cross-references 


RHEA: 15133 (master reaction) 

L-glutaniate + H2O + NAD+ <?> 2-oxoglutarate + H+ + NADH + NHi 


None 


RHEA: 15134 - Left-to- Right (RHEA: 15133, LR) 

L-glutamate + HjO + NAD^ 2-oxoglutarate + H^ + NADH + NHj 


UniPathway:UER0059 1 
Reactome:REACT 710.4 


RHEA: 15135 - Left-to-Right (RHEA: 15133, LR) 

2-oxoglutarate + H+ + NADH + NH+ L-glutamate + HjO + NAD+ 


Reactome:REACT_1896.4 


RHEA:15136 - Bidirectional (RHEA:15133, BI) 

L-glutamate + HjO + NAD+ <i> 2-oxoglutarate + H+ + NADH + NH]^ 


UniPathway:UCR00243 
KEGG:R00243 

MetaCyc: GLUTAMATE-DEHYDROGENASE-RXN 
IntEnz: EC 1.4.1.2 
IntEnz:EC 1.4.1.3 
UniProt: DHE2 PORG3 
UniProt: DHEA NICPL,. . . 



and export, as well as validation and balancing. Rhea re- 
actions (like their constituent ChEBl entities) are 
manually curated and hnked to a host of other resources, 
including underlying structural information. 

Each possible reaction in Rhea is represented by a 
unique 'master reaction' that is independent of any bio- 
logical context, having no associated directional informa- 
tion. Each such master reaction has the following 
attributes: 

• A unique identifier 

• Two reaction parts (left and right) 

• A set of qualifiers that describe the type of reaction 
and if it is balanced 

• A curation status (approved, preliminary or obsolete) 

Each of the two reaction parts (arbitrarily defined as left 
and right) are composed of a set of participant compounds, 
their stoichiometric coefficient and possibly their localiza- 
tion. The compounds are defined by a ChEBl identifier, a 
name, a chemical formula, a net charge and possibly a 2D 
structure. The coefficient may be an integer or a symbolic 
expression («, «+l, n— 1, etc). For transport reactions, 
nominal cellular compartments {localization) are specified 
by the tokens {in) and {out) using the side convention of 
NC-IUBMB. 

Each master reaction is uniquely represented at an ar- 
bitrarily chosen pH of 7.3. Rhea uses the Marvin pA^a 
calculator from ChemAxon (http://www.chemaxon.com) 
to select the major species of each compound found at 
pH 7.3 and at a temperature of 298 K (9). For some com- 
pounds, such as phosphate, the major species at pH 7.3 
(which is HP04^~) may be present in only a slight excess 
over other minor species (such as H2P04~), so the reaction 
representation is a simplification of the actual state. 
However, the selection of the major species serves to guar- 
antee data consistency, since the same protonation state 
will be used in all reactions (and the reactions are fully 
balanced, including protons). This convention is in accord 
with that adopted by some existing resources, such as 



MetaCyc (10), but contrasts with that adopted by 
others, such as KEGG (11) and BRENDA (12), which 
generally choose a neutral form for compound represen- 
tation. This latter convention has the advantage of being 
simple but prevents the balancing of reactions for charge 
and hydrogen atoms. Note that the export formats 
provided by Rhea aUow users to compute structures at 
other pH values using commonly available chemo- 
informatics tools. 

To ensure data consistency, the following additional 
constraints are supported by the Rhea infrastructure: 

• The same compound (ChEBl ID, localization) cannot 
occur on both sides of a reaction. This results in the 
exclusion of certain common species, such as Mg^^ in 
Mg2+-ATP. 

• The reaction must be chemically balanced for mass 
and charge, i.e. the compounds found in the left and 
right parts of the equation must have the same total 
number of atoms of each type and the same net 
charge. 

• The reaction must be unique. To ensure this unique- 
ness, a fingerprint is computed based on the com- 
pounds on both reaction parts (ChEBl ID, 
coefficient, localization). 

Each unique master reaction is associated with three 
directional reactions — forward, reverse and bidirectional 
(reversible) — each of which has a distinct identifier 
(Table 1). (Note that the vahdation steps described 
above for the master reaction are automatically applied 
to each of these directional reactions.) Rhea is therefore 
redundant, in the sense that each reaction has four distinct 
(validated) representations, but this feature aUows direc- 
tional reactions from external resources to be linked to the 
appropriate directional reaction in Rhea (Table 1). The 
drawback of this approach is that Rhea may include dir- 
ectional reactions that are unfeasible in biological systems 
according to current knowledge. 
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Figure 2. The relationships between Rhea and databases of chemical 
compounds, proteins and enzyme classification. ChEBI provides infor- 
mation on chemical compounds for the creation of chemical reactions 
in Rhea. The resulting Rhea reactions are used in IntEnz to describe 
the enzymes catalyzing the reactions. ENZYME is generated from 
IntEnz data and currently uses the textual representation reactions 
also used by the lUBMB EC list. Also shown are the cross-references 
provided by Rhea to other resources. These are generated manually 
and computationally. 



Rhea curation 

Each master reaction has a curation status, which is one of 
approved, preliminary or obsolete. To be approved, a 
reaction has to fulfill the constraints described in the 
previous section relating to unicity and balance. 
Reactions can also have descriptive qualifiers such as 
'chemically balanced', 'transport', 'class of reaction' and 
'polymerization'. 

The manual curation process includes verification of the 
selected ChEBI entities (assisted by chemoinformatic 
tools), the addition of original literature citations, and 
the addition of cross references to other resources. Rhea 
citations are managed in the EBI's CiteXplore bibhog- 
raphy database (http://www.ebi.ac.uk/citexplore/). Rhea 
cross-references several resources describing biochemical 
reactions including KEGG (11), EcoCyc/MetaCyc 
(10,13) and UniPathway (14). As described above, these 
cross-references link specific directional instances of reac- 
tions in Rhea to those of other resources. Cross-references 
are also added automatically to Reactome (15) and 
MACiE (16) on the basis of their reaction participants 
(ChEBI compound) and the indicated reaction direction. 

An important goal of the Rhea project is to link the 
chemical information from ChEBI to that in resources 
describing enzymatic reactions, such as IntEnz 
(Integrated relational Enzyme database) (17) that 
contains data on enzymes organized by EC numbers 
(Figure 2). Rhea now provides the chemical representation 
for enzyme-catalyzed reactions in IntEnz (from which 
Rhea is cross-referenced). Rhea also includes cross- 
references to EC numbers, thus providing an entry 
point to IntEnz and enzyme classification. EC numbers 
are subsequently used to propose cross-references to 
protein sequences in the UniProtKB/Swiss-Prot 
knowledgebase (18). 



Rhea curation may also include the selection of 
descriptive names or labels for the participating ChEBI 
compounds, in order to provide a more easily understand- 
able (human readable) reaction description. It is import- 
ant to note that ChEBI and Rhea labels for a specific 
compound may differ. Rhea makes use of labels marked 
'UniProt synonym' in ChEBI (these labels are so named as 
they will in future provide a controlled vocabulary for 
chemical compounds in UniProtKB). To illustrate this, 
CHEBI: 15378 has the ChEBI common name hydron 
whereas Rhea uses the label H^. This practice of selecting 
easily understood chemical labels for Rhea means that the 
displayed label for certain chemical species will not neces- 
sarily show the correct charge state (a typical example 
being NAD+, CHEBI:57540, which is actually negatively 
charged). However, the underlying compounds are cor- 
rectly charged and the reaction is appropriately balanced. 

Comparison of Rhea and related resources 

When the Rhea project was initiated, the only freely avail- 
able database of reactions was the KEGG LIGAND 
database. During the intervening period, several addition- 
al resources containing information on chemical com- 
pounds and reactions have become available (while 
KEGG data now requires a subscription for download) 
(Table 2). Some of these newly available resources focus 
on enzymes catalyzing chemical reactions [e.g. BRENDA 
(12), ExplorEnz (2), Enzyme (3) and the aforementioned 
IntEnz (15)], while others provide detailed information on 
reaction mechanisms [e.g. EzCatDB (19), MACiE (16)] or 
kinetic data [e.g. BRENDA (12), SABIO-RK (20)]. 
Resources that provide comprehensive reaction data and 
are comparable in scope with Rhea include BioPath (21), 
KEGG (11) and MetaCyc (10). 

Submission to Rhea 

Rhea welcomes submissions describing reactions that are 
not currently available in Rhea. All new reaction submis- 
sions should be posted on our SourceForge Reaction 
Requests/Updates tracker (https://sourceforge.net/ 
projects/rhea-ebi/) with relevant information including 
ChEBI identifiers for each reaction participant and 
cross-references to other relevant databases and source 
hterature where available. 



Rhea content 

At the time of writing, Rhea (release 24) includes 4321 
master reactions (each associated with three directional 
reactions) that involve 3788 distinct ChEBI chemical 
entities. Among them, 251 are transport reactions. In the 
corresponding IntEnz database (release 71) 3145 of the 
4596 enzyme entries (EC numbers) have their reactions 
described in Rhea. This corresponds to a total of 3658 
distinct Rhea reactions (as there may be a many-to-many 
relationship between reactions and enzymes). 

The database is updated by monthly releases. Updates 
are synchronized with ChEBI releases. 
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Table 2. Resources relating to biochemical reactions 



Resource Location 





MetaCyc (10) http://metacyc.org; EcoCyc (13) http://ecocyc.org 


Biopath (21) 


http://www.molecular-networks.com/biopatli/ 


BKM-react (26) 


http://bkm.tu-bs.de/ 


BRENDA (12) 


www.brenda-enzymes.org 


ENZYME (3) 


http 


//enzyme. expasy.org/ 


ExplorEnz (2) 


http 


//www. enzyme-database. org 


EzCatDB (19) 


http 


//mbs.cbrc.jp/EzCatDB/ 


IntEnz (17) 


http 


//www. ebi.ac.uk/intenz 


KEGG (11) 


http 


//www. genome.jp/kegg/ 


MACiE (16) 


http 


//www. ebi.ac.uk/thornton-srv/databases/MACiE/ 


Reactome (15) 


http 


/ /www. reactome. org/ 


SABIO-RK (20) 


http 


/ /sabio. villa-bosch.de/ 


UniPathway (14) 


http 


//www. unipathway.org/pathway 




Rhea -RHEA 15133 



nHEA:1S133 

Last modinad: Z 
□inllliers: Chemically 



ij6 



...9,, 



<35 




Same participants, different directions 

• HHEA.1513J L-glLlamatB + hjO + NAD' => Z-onoglularate + N* + NADH + NH** 
. RHEA:15135 E-oxogiutBfBte + H* + NADH + NH.' -»L-glutamale + HjO + NAD' 
- RHEA:16136 L-glutamate + HzO + NAD* Z-oxoglularate + H* + NADH + HHa* 

Cross references 



te mslsbtTK; daa of UnlPioO^fSwbs^njI Detabase entries. 



KnoHtedge base lor btochainlcal reactions by tfis Kyoto ErKydapedIa of Genes and Qenomes. 



!d meiabolic pathways. 



REACT 7t0.f P , REAC T IB; 



id knoivledgBbasa of biological paffrvrays. 



-EnzyineB(2) 

IntEma : (InlsgratBdialaltonaierizywedataliaSBjAtrmlyavailBblBrBSOiinxffxusoiior, 



-Proteins (60) 

UrIProla i TTie Universal Protein Resource (UniProt) is a comprehensive resource tor prolein sequence and annataHcn data. 

DHEg PORf;?g <M> . DHEA NICPLo ». DHE3 PVBKQb ca- . DHE3 THEPRa . DHE2 NEUCRa oa, DHE3 BOVINo c 
□ HEg THUTHa |co-, DH£g CLQSVs <=■ . DHE3 MOUSEa <=> , DHE2 CLODIa co^ . DHE3 ELEELa -sa^ , □HE2 PEPASa <= 
GUDB BACSU0 <s- 
^Shaw 35 more 



I. Carrigaci J8, Coughlar S, Engel PC (£005) 
Properties of the thermostable glutamate dehydrogenase ol the meaophilic anaerotw Peptoalreptoccus aeaccharolytlcus purlHed by a 
FEUS microbiology letters 244, 53-9 !PMID:t57278Zla} 



el method alter os 



Figure 3. Sample master reaction and associated directional reactions in Rhea, http://www.ebi. ac.uk/rhea/reaction.xhtml?id= 15133 RHEA:15133 is 
a master reaction. This master reaction is described by its chemicals (labels and identifiers in ChEBI), and has no specific direction. The three 
associated directional reactions are indicated in the section titled 'Same participants, different directions'. These are: RHEA:15134 (left-to-right), 
RHEA: 15135 (right-to-left) and RHEA: 151 36 (bidirectional). The section titled 'Cross-references' fists all related resources to which one of the 
directional reactions has been linked. The actual Rhea reaction can be determined from the directional icon (' = >' for RHEA:15134, '< = ' for 
RHEA:15135 and "< = >' for RHEA:15136). The user can retrieve a fist of afi the reactions a specific compound is involved in by clicking the 
binocular symbol next to the compound name. Marvin is used for displaying chemical structures, Marvin 5.0.0, 2011 (http://www.chemaxon.com). 



Rhea web server 

The Rhea web server (http://www.ebi.ac.uk/rhea) enables 
access to Rhea data. It provides browsing, searching, web 
services and download facilities. 

An example of a reaction page is shown in Figure 3. 
It presents the chemical transfonnation (including 
the reaction participants' names, ChEBI identifiers 
and chemical structures), and related information 



(cross-references and citations). The Rhea web site 
provides a number of search facilities (described below), 
and also allows users to navigate the reaction set via their 
common compounds. 

The Rhea website allows simple queries using any of the 
following as input: 

• a reaction identifier from Rhea or any of the cross- 
referenced resources [KEGG (11), EcoCyc (13), 
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MetaCyc (10), UniPathway (14), MACiE (16) or 
Reactome (15)]; 

• a compound name or identifier from ChEBI. The 
ChEBI index is used to resolve any synonyms and 
cross-references; 

• an equation describing the reaction. The Rhea search 
tool will attempt to parse this and return any matching 
reactions. If none is found, it will look for similar re- 
actions, i.e. with as many of the participants in the 
equation as possible; 

• an EC number, an enzyme name or a UniProtKB/ 
Swiss-Prot identifier. The IntEnz index is used to 
resolve any enzyme synonyms; 

• a bibhographic citation, or any part of, such as an 
author name, title, abstract or publication identifier 
(PubMed identifier). 

In addition, an advanced search page allows a search to 
be restricted to specific fields (e.g. reaction participants, 
cross-references or citations) and to perform structural 
searches. In this latter case, the user can import or 
draw a 2D-structure through the JChemPaint applet 
(jchempaint.sourceforge.net), following which the 
chemical structure search algorithm OrChem (22) will 
perform a substructure or similarity search on the set of 
compounds involved in Rhea. The complete documenta- 
tion of the structural search is available on the ChEBI web 
server (http://www.ebi.ac.uk/chebi/userManualForward 
.do). 

Downloads 

All Rhea data is available for free download (http://www 
.ebi.ac.uk/rhea/download.xhtml) in three formats. These 
are BioPAX level 2 (23), RXN and RD (24), which facili- 
tate the use of data by chemoinformatics software tools. 

BioPAX (23) is a collaborative effort to create a data 
exchange format for biological pathway data. It is defined 
in OWL and represented in RDF/XML syntax. Rhea re- 
actions correspond to the biochemicalReaction, transport 
and transportWithBiochemicalReaction BioPax classes. 
An example of a BioPAX export is given in the supple- 
mentary data section (Supplementary Figure SI). RXN is 
one of the chemical table (CT) file formats specified by 
Accelrys (formerly Symyx and MDL) (24). RXN repre- 
sents unidirectional processes, so the Rhea export in RXN 
includes only this subset of Rhea reactions. RD (reaction 
data) is a second CT file format, consisting of a set of 
records, each record defining one directional reaction (in 
RXN format) and associated data. 

The 2D structures of the subset of ChEBI compounds 
referenced by Rhea are also available as an SDF file, a 
chemical format specified by Accelrys (formerly by MDL, 
24). That format includes information about the atoms, 
bonds, connectivity and coordinates of each of the 
molecules of interest. 

Web services 

The Rhea resource is exposed as RESTful web services. 
Results following standard HTTP GET requests are sent 
to the chent (e.g. a web browser) in one of RXN (24), 



BioPAX Level 2 (23) or CMLReact formats (25), depend- 
ing on the HTTP Accept header or the URL of the 
request. 

The Service makes available two methods: one for 
general searching of reactions and another for retrieving 
the full entry of a reaction. 

The main use case of the Rhea Web Service is a client 
application, which invokes the search method to get a list 
of reaction URLs in the format it requires, and then 
accesses these URLs to retrieve the full reaction entries. 
Another use case is a chent application, which acquires a 
Rhea identifier, referenced in other services and then 
creates the URL specifying the format it requires to 
retrieve the full entry of the reaction. The service URL 
and the instructions on how to access the service can be 
found at http://www.ebi.ac.Uk/rhea/rest/l.0. 

Software 

The Rhea database architecture and software tools are 
distributed as Open Source (at http://sourceforge.net/ 
projects/rhea-ebi/) aUowing end-users to download and 
install their own local database of reactions. AU 
software is written in Java 6. 
The software package includes amongst other things: 

• Database schema — aUowing the storage and validation 
of data. 

• Domain model and data vahdator — providing a 
reaction model and ability to validate the reaction. 

• Rhea annotation tool — provides the ability for 
curators to add and modify their own reactions. 

• Rhea public website — allowing reaction visualization. 

The Rhea database runs on Oracle 11 g (http://www 
.oracle.com). However, minor tweaks allow it to run on 
open database platforms such as MySQL (www.mysql 
.com). The database schema is given in the supplementary 
data section (Supplementary Figure S2 and 
Supplementary Table SI). 

Future directions 

CoUaborative developments with the Universal Protein 
Resource KnowledgeBase, UniProtKB (13), are on- 
going with the aim of using Rhea as a reference vocabu- 
lary for describing enzymatic reactions within UniProtKB 
protein sequence records. Rhea also aims at serving as a 
general resource of chemical reactions for the reconstruc- 
tion of genome-scale metabolic networks, as in the 
Micronie (http://www.microme.eu) and MetaNetX 
(http://www.metanetx.org) initiatives. We are examining 
a number of curated metabolic networks in order to 
identify missing reactions of interest and to curate these 
in Rhea, with the aim of enhancing the unique content of 
this reaction resource. We are also developing and 
enhancing our submission tools to allow batch submis- 
sion, thereby speeding up the population of Rhea with 
new reactions. We will also further exploit the chemical 
ontology of ChEBI, where relationships between reaction 
participants can serve as a basis for the development of a 
logical reaction classification. 



D760 Nucleic Acids Research, 2012, Vol. 40, Database issue 



SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online. 
Supplementary Figures 1-2 and Table 1. 
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